feat: parallelize sync generate method for improved LLM throughput #34043

ambershen · 2025-11-19T22:58:53Z

Description

This PR optimizes the sync generate() method in BaseChatModel to improve throughput when processing multiple prompts by parallelizing LLM calls using a thread-pool executor.

Changes

Core Optimization: Replaced sequential loop with thread-pool executor mapping for multi-input processing in chat_models.py:904-947
Performance: Added fast path for single input to avoid unnecessary overhead
Compatibility: Preserved original ordering, callback behavior, and error propagation
Resource Management: Used get_executor_for_config context manager for proper thread pool lifecycle

Technical Details

The refactor maintains the same API while significantly improving performance for batch processing scenarios:

Before: Sequential processing of each message list
After: Parallel processing using thread-pool executor with proper error handling
Error Handling: Preserved existing error propagation with on_llm_error callbacks
Ordering: Results are returned in the same order as input messages

Performance Impact

This change will improve throughput when processing multiple prompts simultaneously, especially beneficial for:

Batch inference scenarios
Multi-prompt workflows
Applications processing multiple conversations in parallel

Testing

The changes preserve all existing behavior:

✅ Error handling and callback invocation
✅ Result ordering and structure
✅ Single input fast path optimization
✅ Resource cleanup via context manager

Checklist

Code follows project conventions
Maintains backward compatibility
Preserves error handling behavior
Uses existing utilities (get_executor_for_config)
No breaking changes to public API

- Replace sequential loop with thread-pool executor mapping for multi-input processing - Preserve ordering, callback behavior, and error propagation - Add fast path for single input to avoid unnecessary overhead - Use get_executor_for_config context manager for proper resource management This optimization improves throughput when processing multiple prompts without breaking existing functionality or changing the API.

codspeed-hq · 2025-11-19T23:01:25Z

CodSpeed Performance Report

Merging #34043 will degrade performances by 24.16%

_{Comparing ambershen:optimize/llm-sync-generate-parallelization (e85d221) with master (525d5c0)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

❌ 1 regression
✅ 12 untouched
⏩ 21 skipped¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
❌	WallTime	`test_async_callbacks_in_sync`	18.4 ms	24.3 ms	-24.16%

21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

ambershen requested a review from eyurtsev as a code owner November 19, 2025 22:58

github-actions bot added core Related to the package `langchain-core` feature labels Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: parallelize sync generate method for improved LLM throughput #34043

feat: parallelize sync generate method for improved LLM throughput #34043

ambershen commented Nov 19, 2025

Uh oh!

codspeed-hq bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: parallelize sync generate method for improved LLM throughput #34043

Are you sure you want to change the base?

feat: parallelize sync generate method for improved LLM throughput #34043

Conversation

ambershen commented Nov 19, 2025

Description

Changes

Technical Details

Performance Impact

Testing

Related

Checklist

Uh oh!

codspeed-hq bot commented Nov 19, 2025

CodSpeed Performance Report

Merging #34043 will degrade performances by 24.16%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant